MiWoCI Workshop - 2017

نویسندگان

  • Frank-Michael Schleif
  • Thomas Villmann
چکیده

We propose a method for learning clusters of pharmacokinetic models demonstrated on a clinical data set investigating the 11β-HSD1 activity in healthy adults. Prednisone has an identical affinity for 11β-HSD1 as cortisone and the interconversion of oral prednisone to prednisolone has been used as a marker of the enzyme activity. The parameters of the multi-compartment ordinary differential equation model are studied via identifiability analysis and the observable measurements, which is used to interpret the learned clusters. We approximate the model using the pertubation method, which enables very efficient training of the proposed Guassian mixture clustering technique optimized by Estimation Maximization (EM). The training on the clinical data results in 4 clusters resembling the prednisone conversion rate in a period of 4 hours based on venous blood samples taken at 20-minute intervals. The learned clusters differ in prednisone absorption as well as prednisone/prednisolone conversion rate, which can be seen from the analysis of the learned parameter relationships. Consultation of further satellite data for each person not used for training reveals a correlation of cluster membership and total fat mass. MiWoCI Workshop 2017 Machine Learning Reports 5 Computer aided diagnosis of inborn steroidogenic disorders Sreejita Ghosh, Elizabeth Sarah Baranowski, Michael Biehl, Wiebke Arlt, Peter Tino, Kerstin Bunte 1University of Groningen JBI of Mathematics and Computer Science, NL 2University of Birmingham IMSR, UK 3University of Birmingham School of Computer Science, UK Abstract: Due to improved biochemical sensor technology, there is increase in both amount of complex biomedical data, and the demand for automated interpretable interdisciplinary analysis techniques. However biomedical data have the challenges of 1) heterogenous measures, 2) missingness, and 3) imbalanced classes. The problem of imbalanced class becomes prominent especially for patients with rare diseases. For such datasets, even if all the patients are misclassififield as healthy the overall class accuracy might still be close to ninety percent. Thus optimizing overall class accuracy of the classification technique is not enough. It is the high detection rate of the minority classes which is particularly desirable. We have dataset of rare inborn steroidogenic disorsders which are caused by specific genetic mutation, and lead to defective production of any of the enzymes or a cofactor responsible for catalysing salt and glucose homeostasis, sex differentiation and sex specific development. Inborn steroidogenic disorders need to be diagnosed as early as possible, to avoid delay of lifesaving glucocorticoid therapy for adrenal insufficiency, and to facilitate gender allocation and surgical planning in patients with disordered sex development. Our dataset consist of urine GC/MS measurements from 829 healthy controls (305 under 1 year of age) and 118 genetically confirmed patients with steroidogenic disorders. Data samples are presented as 165 dimensional ratio vectors of 34 distinct steroid metabolite concentrations constructed using domain knowledge [3]. Bunte et al. [1] introduced an approach for the computer-aided diagnosis of the most prevalent condition, 21-hydroxylase deficiency (CYP21A2), and two other representative, 5α-reductase type 2 deficiency (SRD5A2) and P450 oxidorectase deficiency (PORD), and simultaneously handling missing and heterogenous measurements in the urine data. In Ghosh et al. [3] we investigated two main strategies for learning from imbalanced data: 1) penalizing misclassification of disease to healthy more severely than of misclassification within-diseases. 2) re-sampling the original dataset by either under-sampling the majority class and/or over-sampling the minority classes according to Chawla et al. [2]. We used two variants of Learning vector quantization(LVQ) which are capable of dealing with missingness, NaNLVQ and Angle-LVQ, as classifiers. In Ghosh et al. [3] we had just used the relevance vectors. As next steps we investigated 2 and 3 dimension global matrices in the Angle-LVQ classifier, followed by the corresponding local matrices, to see if we could gain further insights from these higher dimensions and more complex models. From the 2 and 3 dimension global matrices we obtained Disease Maps and Disease Globes which are the projection of the samples from 2D and 3D global matrices of AngleLVQ respectively. The Disease Globes were then flattened out into maps using Mollweide projection. Comparison between the relevance profiles obtained from local and global matrices gave us better idea about the disease specific blokages in the steroidogenic pathway (extraction of important decision boundares). Such an understanding will help us create a system for personalized medicine and individual treatment titration. In this workshop we would like to discuss the results from the above experiments and the issues we are trying to solve. Due to improved biochemical sensor technology, there is increase in both amount of complex biomedical data, and the demand for automated interpretable interdisciplinary analysis techniques. However biomedical data have the challenges of 1) heterogenous measures, 2) missingness, and 3) imbalanced classes. The problem of imbalanced class becomes prominent especially for patients with rare diseases. For such datasets, even if all the patients are misclassififield as healthy the overall class accuracy might still be close to ninety percent. Thus optimizing overall class accuracy of the classification technique is not enough. It is the high detection rate of the minority classes which is particularly desirable. We have dataset of rare inborn steroidogenic disorsders which are caused by specific genetic mutation, and lead to defective production of any of the enzymes or a cofactor responsible for catalysing salt and glucose homeostasis, sex differentiation and sex specific development. Inborn steroidogenic disorders need to be diagnosed as early as possible, to avoid delay of lifesaving glucocorticoid therapy for adrenal insufficiency, and to facilitate gender allocation and surgical planning in patients with disordered sex development. Our dataset consist of urine GC/MS measurements from 829 healthy controls (305 under 1 year of age) and 118 genetically confirmed patients with steroidogenic disorders. Data samples are presented as 165 dimensional ratio vectors of 34 distinct steroid metabolite concentrations constructed using domain knowledge [3]. Bunte et al. [1] introduced an approach for the computer-aided diagnosis of the most prevalent condition, 21-hydroxylase deficiency (CYP21A2), and two other representative, 5α-reductase type 2 deficiency (SRD5A2) and P450 oxidorectase deficiency (PORD), and simultaneously handling missing and heterogenous measurements in the urine data. In Ghosh et al. [3] we investigated two main strategies for learning from imbalanced data: 1) penalizing misclassification of disease to healthy more severely than of misclassification within-diseases. 2) re-sampling the original dataset by either under-sampling the majority class and/or over-sampling the minority classes according to Chawla et al. [2]. We used two variants of Learning vector quantization(LVQ) which are capable of dealing with missingness, NaNLVQ and Angle-LVQ, as classifiers. In Ghosh et al. [3] we had just used the relevance vectors. As next steps we investigated 2 and 3 dimension global matrices in the Angle-LVQ classifier, followed by the corresponding local matrices, to see if we could gain further insights from these higher dimensions and more complex models. From the 2 and 3 dimension global matrices we obtained Disease Maps and Disease Globes which are the projection of the samples from 2D and 3D global matrices of AngleLVQ respectively. The Disease Globes were then flattened out into maps using Mollweide projection. Comparison between the relevance profiles obtained from local and global matrices gave us better idea about the disease specific blokages in the steroidogenic pathway (extraction of important decision boundares). Such an understanding will help us create a system for personalized medicine and individual treatment titration. In this workshop we would like to discuss the results from the above experiments and the issues we are trying to solve.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MIWOCI Workshop - 2013

Nonlinear dimensionality reduction (DR) techniques offer the possibility to visually inspect a high-dimensional data set in two dimensions, and such methods have recently been extended to also visualize class boundaries as induced by a trained classifier on the data. In this contribution, we investigate the effect of two different ways to shape the involved dimensionality reduction technique in...

متن کامل

ECIR WORKSHOP REPORT Report on the 5th International Workshop on Bibliometric-enhanced Information Retrieval

This workshop report presents the output of the fifth Bibliometric-enhanced Information Retrieval (BIR) workshop, which has been co-located with the 39th European Conference on Information Retrieval (ECIR 2017) in Aberdeen, UK. We motivate our workshop and outline the papers (one keynote, six regular papers and five poster papers) presented at BIR 2017. Finally, we conclude with an outlook and ...

متن کامل

Diabetic foot workshop: Improving technical and educational skills for nurses

Diabetes mellitus as one of the most common metabolic disorders has some complications, one of the main ones is diabetic foot (DF). Appropriate care and education prevents 85% of diabetic foot amputations. An ideal management to prevent and treat diabetic foot necessitates a close collaboration between the health team members and the diabetic patient. Therefore, improving nurses' knowledge a...

متن کامل

Conference report on the Indo Global Summit on Head and Neck Oncology (IGSHNO 2017-BMCON-IV), 24–26 February 2017, Jaipur, India

'The multidisciplinary approach: expanding treatment horizons for head and neck cancer' was the major theme of the Indo Global Summit on Head and Neck Oncology (IGSHNO 2017-BMCON-IV). The meeting, held in Jaipur (Rajasthan, India) from 24 to 26 February 2017, assembled 600 participants from India and worldwide. It was organised by the Bhagwan Mahaveer Cancer Hospital and Research Centre (BMCHRC...

متن کامل

مقایسه و تحلیلی بر استفاده از الگوریتم های فراابتکاری برای حل مسائل زمانبندی تولید کارگاهی

One of the most important problems in research and applied fields of production management is a suitable scheduling for different operations. So, there are many approaches for job workshop or job non-workshop scheduling problems. Since job workshop scheduling problems (JSP) belong to NP-Hard class, some metaheuristics methods such as Tabu Search, Simulated Annealing, Genetic Algorithm and Parti...

متن کامل

SAVE-SD 2017: Third Workshop on Semantics, Analytics and Visualisation: Enhancing Scholarly Data

The third edition of the Workshop on Semantics, Analytics and Visualisation: Enhancing Scholarly Data (SAVE-SD 2017) is taking place in Perth, Australia on the 3rd of April 2017, co-located with the 26th International World Wide Web Conference. The main goal of the workshop is to provide a venue for researchers, publishers and other companies to engage in discussions about semantics, analytics ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017